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Abstract. We consider the estimation of high-dimensional network structures from 
partially observed Markov random field data using a penalized pseudo-likelihood ap- 
proach. We fit a misspecified model obtained by ignoring the missing data problem. We 
study the consistency of the estimator and derive a bound on its rate of convergence. 
The results obtained relate the rate of convergence of the estimator to the extent of the 
missing data problem. We report some simulation results that empirically validate some 
of the theoretical findings. 



1. Introduction and statement of the results 
The problem of high-dimensional network structure estimation has recently attracted a 
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paper focuses mainly on Markov Random Fields (MRF) for non-Gaussian data. The prob- 
lem can be described as follows. Let (X^\ . . .,X^) be n i.i.d. random variables where 



= (X[ , . . . , Xp ) is a p-dimensional vector of dependent random variables with joint 
density 

f (xi, ...,x p ) 



1 



exp 



+ 6(s,s)B (x s )) + 0(s,s')B(x s ,x s/ )} , (1) 

l<s<s'<p 

for known functions A, Bq : X — > R and a symmetric function B : X x X — > R, where X 
is a compact (generally finite) set. The real- valued symmetric matrix 9 = {9(s,s'), 1 < 
s,s' < p} is the network structure and is the parameter of interest. The term Zg is a 
norma lizing constant. This type of statistical models was pioneered by J. Besag (IBesag 
(119741 )) under the name of auto-model and we adopt the same name here, although Besag's 
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auto-models corresponds to setting B(x,y) = xy above. The nice feature of model (P) is 
that for any 1 < s < p, the conditional density of X s given {Xj,j ^ s} = x E X p ~ x is 



f { e S \u\x) 



-^yexp I A(u) + 9(s,s)B (u) + ^29(s,j)B{u,Xj) \ , 
7j e [ j& J 



(2) 



for a normalizing constant Z@ = Zg"'(x). Therefore, 0(s,j) = implies that X s and Xj 
are conditionally independent given the other variables Xk, k ^ {s,j}- Thus estimating 
provides us with the dependence structure and the magnitude of the dependence between 
these variables. 

This paper focuses on the situation where the outcomes Xj are either categorical (X 
is a finite set) or continuous bounded (X C M mx is compact). Based on (X^\ . . . ,X^ n '), 
the true network structure denoted 6+ = {0*(s,s'), 1 < s,s' < p} can be consistently es- 



timated using a number of met 



r ods, e v en when the number of entries of 8* i s muc h large 



than n (|rIofnng and Tibshirani 



pooa ): 



Ravikumar et al. 



||2Qld) 



Guo et al 



told )). For 



computational tractability, a pseudo-likelihood approach is often preferred, even though it 
incurs a certain lost of efficiency. In the c ase of the a u to-log istic model (where X = {0, 1}, 
Aq(u) = 0, Bq(u) = u, B(u,v) = uv), 



Guo et al 



(|2010l ) shows that the ^-penalized 



pseudo-likelihood estimator of Q* is consistent with I 2 rate of convergence bounded from 
above by a~ 1 y / a logp/ra, where a is the number of non-zero elem e nts of 6* and a is the 



Ravikumar et al. 



3) 



(2010) obtained simi- 



smallest eigenvalue of the information matrix. 
lar results for a one-neighborhood-at-the-time ^-penalized pseudo-likelihood estimator. 

also derived some properties of the oracle estimator with the SCAD 



Xue et al 



penalty. 

In many situations where network estimation is needed, the network data is only par- 
tially observed because certain nodes are missing from the sample. For example, in social 
network analysis, some close friends or siblings might not be part of the survey. As an- 
other example, in protein-protein networks, the analysis is often restricted to the specific 
subgroup of proteins that is believed to carry a role in a given biological function. So 
doing, some important but not yet identified proteins might be omitted from the analy- 
sis. This paper consider the problem of network estimation from partially observed MRF 
data. The issue cannot be completely addressed by simply ignoring the missing nodes 
and assuming that the observed data follows a MRF. This is because, unlike Gaussian 
distributions, Markov Random Field distributions are not closed under marginalization. 
For example, if there exist r additional nodes denoted p + 1, . . . ,p + r such that the 
joint distribution of (X±, . . . , X p , X p+ i, . . . ,X p+r ) is an auto-model with network struc- 
ture {6(s, s'), 1 < s, s' < p + r}, then the joint (marginal) distribution of {X\, . . . ,X p ) is 
not of the form (TT]) in general. To take a specific example, if r = 1 and A = Bq = and 
B(x, y) = B(x)B(y), then the joint (marginal) distribution of (X\, . . . , X p ) is the mixture 
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distribution 



fe{xi 



) = Z,- 1 ^exp<^0 i ( S ) J B(x s ) + Yl 0(s,s')B(x s )B(x s/ ) 




where 6i(s) = B(i)6(s,p + 1). Furthermore, the conditional distributions are altered. 
Indeed, and keeping with the assumption r = 1, if \9(s,p + 1)| > 0, then the conditional 
density of X s given {Xi, £ ^ s, 1 < £ < p} depends not only Xt for all £ such that 
\0(s,£)\ > 0, but also on X k for all k such that \9(k,p+ 1)| > 0. However, if 0(s,p+l) = 0, 
the conditional density of X s given {JQ, £ ^ s, 1 < £ < p} remains (|2|). This suggests that 
if we ignore the missing nodes and fit the misspecified model ([1]) to the observed data, the 
resulting estimator will be well-behaved to the extent that the missing data problem is 
limited. That is, to the extent that X^=i l^*( s ).P+ 1)1 i s small in the case r = 1 considered 
above. 

The goal of the paper is to formalize this idea. In order to do so, we consider an infinite- 
volume Markov random field model, where only part of the field is observed, and we fit the 
misspecified model ([TJ using penalized pseudo-likelihood approach. We derive a general 
consistency result and show that under certain conditions, the estimators converges at the 
rate of (-^/ a n \ogp n /n + T n b n )/a n , where p n is the number of observed nodes, a n is the 
number of non-zero entries of the true network, a n is the smallest eigenvalue of the Fisher 
information matrix, and where the term T n b n quantifies the effect of the missing nodes 
(see Theorem 11.41 for a more rigorous statement). We conclude that the estimator 9 n is 
robust to a small to moderate amount of missing data. We report some simulation results 
that are consistent with these findings. In practical situations where MRF are used, it is 
often unclear whether one is dealing with a partially observed field with important missing 
nodes. The above discussion thus stresses the need for methods of detecting the existence 
of missing nodes in Markov random field data. We leave this problem for future research, 
as it requires a better understanding of the asymptotic behavior of 6 n . 

The paper is organized as follows. The infinite- volume Markov random field setting 
and the estimators are presented in Section 11.11 The paper presents two main results: 
Theorem 11.21 (and Corollary II. 3|) on the consistency of the estimator, and Theorem 11.41 
(and Corollary II. 5p on its rate of convergence. These results are presented in Section 
11.21 The simulation example is presented in Section 11.31 Section [2] develops the technical 
proofs. 

1.1. The setting. Let (X,£,p) be a measure space. We assume that X is a compact 
subset of M mx , £ its Borel sigma-algebra, and p a finite measure. The compactness of 
X is wrt the usual Euclidean metric. X is the sample space of the observations X{. The 
main case of interest is the case where X is finite. Let S be a countably infinite set 
(typically, S is a subset of the Euclidean space M ms for some finite integer mg > 1). 
The set S represents the nodes of the network. We assume that S is equipped with 
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a linear ordering y (for example, the lexicographical ordering of R ms ). We introduce 
5 2 = f {(s, I) G S x S : £ y s}, the set of all ordered pairs of S. More generally, if A is a 
subset of S, we denote by A 2 , the set of all ordered pairs (u, v) G A x A, with v >z u. 

Let A, Bo : X — > R, B : XxX->Kbe known measurable functions such that B{x, y) = 
B(y,x) (symmetry). We also assume that the diagonal of B is Bq: B(x,x) = Bq(x) for 
all x G X. We assume throughout the paper that 

IIAHoo < oo, H-Bolloo < oo, and ||-B||oo < oo. (3) 

In the above, ||/||oo is the supremum norm. 

An infinite matrix is a map from S x S to R. For an infinite matrix 9 : 5x5-yl and 
s G 5, the ^-neighborhood of s is the set 

d e s = {£eS : t^s and \6(s,£)\ > 0}, 

and the ^-degree of node s is the quantity (possibly infinite) 

degM) d ^ f £ |<?M)|= £ 

We denote .M the space of all infinite symmetric matrices 9 such that deg(s,(9) < oo for 
all s G S. For g G [1, oo), we denote by M q the Banach space of all infinite symmetric 
matrices 9 G M such that 



|#M)| 



Let (CI, T) = (X 5 , £ s ) be the product space equipped with the product topology and 
its Borel sigma-algebra. For 9 G Ai, let \xg be the probability measure on _F") such that 
if {X s , s G 5} is a stochastic process with distribution the conditional distribution 
of X s given the sigma-algebra generated by {Xi, £ ^ s} exists and has density (wrt p) 
&\-\x), where for u G X, x G X 5 \W, 





/<? \ u \ x ) = — (^ ex P ^ 



A(u) + 9(s,s)B (u)+ 0(s,£)B(u,x e )} , 

feS\{s} 

for a normalizing constant zi . Notice that /g^(«|x) actually depends only on xg gS = 
{x£ : £ G dgs}. Under ([3]) and for 9 G .M, such distribution fig exists (but might not 
be unique in general). We refer the reader to Appendix 1 for a precise definition and 
existence of fig. A random process {X s , s G S} with distribution fig is called an infinite- 
volume auto-model random field. We denote by E# the expectation operator with respect 
to jig on (f2, J 7 ). When 9 is the true network structure 9* (introduced below), we simply 
write E* instead of E^. For A C S, we denote X\ the stochastic process {X s , s G A}. 
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Prom fjj S \ and for a measurable function H : X x X s \i s } — y R, we can obtain the con- 
ditional expectation E e (H(X S , X s \{a})\X S \{ 8 }) as J x H(u, X s \ {s} )f { g s \u\X s \ {s} )du, pro- 
vided the integral is well defined. And we can define similarly the conditional variance 
Var e (H(X S ,X SX{S} )\X SX{S} ). 

For 6* G M, let {X®, i > 1} be a sequence of i.i.d. infinite- volume random fields 
with distribution \xq^ defined on some probability space with probability measure and 
expectation operator E*. Let {D n , n > 1} be a sequence of increasing finite subsets of S 
such that D n f S. For a finite set A, \ A\ denotes its cardinality and we set p n = \D n \. For 
n > 1, let d n = Pn(Pn + l)/2 and denote A4( n ) the space of all symmetric finite matrices 
{9(s,e), s,£ G D n }, that we identify with M dn . 

We assume that for some n > 1, we observe partially each of the random field X® 
(1 < i < n) over the domain D n giving rise to observations Xp = {xjp , s G D n }. The 
remaining points S \ D n are not known and the associated random variables Xg\£) n are 
not observed. We are interested in estimating the infinite matrix 0+. For s G S, we define 
5s = c^s and called it the (true) neighborhood of s. We also define d n s = D n \ {s}. 
Since the neighborhood system {ds, s G S} is not known, we introduce the approximate 
full conditional distributions 



fi S \u\x dn s) = exp ( A(u) + 0(s,*)5o(«) + £ ) , (5) 



for some normalizing constant For A > 0, let q\ : [0, oo) — > [0, oo) a penalty function. 

We then define the functions 



71 

^(^^^log/f^llW), and Q n (0)=4(0)- £ <ZA„(|0M)|), ^M (n) , 

for some parameter A n > 0. We are mainly interested in convex penalty functions, partic- 
ularly the I 1 penalty for which q\(x) = Ax. But we develop much of the results under the 
general condition ATI] below that ap plies in principle to non-convex penalties such as the 



SCAD penalty of Fan and Lil (1200 lh . 



Al For any A > 0, q\(0) = 0, q\ is right-continuous at and differentiable on (0, oo) 
and 

supsup |g^(a;)|/A < oo. (6) 

A>0 x>0 

Finally, we define 

ArgmaxQ n d = {8 G : Q n (9) = sup Q„(0)}, 

and we call any element 9 n of Argmax(5 n a maximizer of Q n , that is a penalized pseudo- 
likelihood estimator of 8*. 
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Remark 1. We want to stress the fact that the sets S and D n are purely conceptual and 
need not be known. This is because we have replaced the full conditional density @ by 
the approximation §5§ in which the neighborhood of s is d n s = D n \ {s}, and without any 
loss of generality we can replace D n by {1, . . . ,p n }. As a result, the computation of 9 n 
does not make use of S and D n . For instance, with the I 1 penalty, on e obta in s the same 



^pen alized pseudo-likelihood estimator as in Irlofling and Tibshiranil ( 2009 ); 



(|20ld ) 



Guo et al 



It is useful to have some simple conditions under which ArgmaxQ n is not empty. 

Proposition 1.1. Fix n > 1. Suppose that for any s G S, there exists a finite constant 
c(s) such that for all 9 G M^, all u G X and for all x 9nS G X dnS , 

fi S \ u \xa n s) < c(s). 

Suppose also that for any a, A > the set {x > : q\(x) < a} is bounded. Then 
Argmax Q n is non-empty. 

Remark 2. The result is not always useful. It applies to the i 1 penalty but not to the SCAD 
penalty. If X is finite as in all the examples below, then fg S \-\xQ nS ) is a finite probability 
mass function. Therefore the assumption of the proposition holds with c(s) = 1. 

Proof. Fix a sample path ui G II. Then Q n is a continuous R- valued function on A4^ n \ 
Denote the null element of , and r = Q re (0). Then L r d = {9 G : Q„(60 > r} 

is nonempty and closed by continuity of Q n . Under the assumption of the proposition, if 
9 G L r , then for any (s,^) G ;D^: 

qx n (\0(s,t)\) < E fi.i'.flD^Eks^-r. 
Thus L r is a compact subset of Ai^ and Q n attains it maximum at 9 n G L r . □ 



1.2. Consistency and rate of convergence. Let Ai\ be the separable Banach space 
of all 9 G M. such that ||0||x = f J2(uv)es 2 \@( u i v )\ < 00 • We investigate the consistency of 
# n as a random element of Ai\ under the following sparsity assumption. 

A2 0+ G Ai and for any s G S, the ^-neighborhood of s (that is, the set de+s = {(. G 
cS \ {s} : 6»*(s, £) / 0}) is a finite set. 

A[2] guarantees that for G A4i, 9 + 9* G A4, so that the full conditional densities 
fe+e i u \ x s\{s}) are wen defined. For two matrices 9,9' G Ai, we write 9 ■ 9' to denote the 
component-wise product. And if 9 G M, and n > 1, 0^ denotes the element of jM^ n ^ such 
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that 6>( n ) (u,v) = 9(u,v) if (u,v) G D n x D n (and 9^ n \u,v) = otherwise). We introduce 



+ E (^(|^(s^) + 0( S ^)|)-9a„(|^(s^)|)), 9^1. (7) 

U n (6) is no other than n _1 (Q n (#*) — <3n.(#* + #)) and is minimized at 9 n — We also 
introduce the conditional Kulback-Leibler divergence function 

W 



log 



/ e y M*s\w) dtt , e M. (8) 



By the concavity of the logarithm function, ^'(9^,9) > 0. Finally, we define 

K{e*,e)= E k^(9*,e), and M^,0) d = E^ (s) (^)- 

sG-D„ sG5 



(9) 



Clearly, k n (9+,9) is nondecreasing in n and converges to k(9*,6). For any 9 E Ai± and 
u E X, we use ([29]) and ([3]) to verify that 



log/^H^sUs}) - log/^^lX^}) 



< 



(s,s)B (u)+ V 0(s,£)B(u,X t ] 



ees\{s} 



<C i\0(s,8)\+ E I^M)! h 

for some finite constant C. This implies that ^ se £> n k^ s \9^,9) < C||(9||i < oo and proves 
that k(9+,9) is finite. Also notice that Argmin k(9*,-) is nonempty and contains the null 
matrix 0. 



as in 


Hess 


Dal Maso 



1996 ). We 



review some definitions. For more on epi-convergence, we refer to iDal Masol (|l993l ). Let 
(V,d) be a metric space and {f,f n , n > 1} be a sequence of functions defined on V, and 
taking values in 1U {— oo,+oo}. The epi-limit inferior of {f n , n > 1} is the function 

li e /nO) = sup liminf inf f n (v), 

fe > x n->oo v&B(x,k- 1 ) 

where B(x, e) denotes the open ball of V with center x and radius e. We define similarly 
the epi-limit superior of {/„, n > 1} as 

\s e fn(x) = sup limsup inf f n (v). 

k>l n^oo v£B(x,k~ 1 ) 

We say that /„ epi-converges (or T-converges) to / if \s e f n (x) < f(x) < \\ e f n (x) for all 



YVES F. ATCHADE 



Theorem 1.2. Assume A{MM (EP an d suppose that n 1 A r 
almost surely, U n epi-converges to k(9+,-) in (Mi, \\ • ||i). 

Proof. See Section I2T231 



o(l) , as n — )• 00. T/ien 



□ 



Epi-convergence is a very useful tool in the study of minimizers. The key result in that 
respect is as follows (using the notations of the above paragraph). If /„ epi-converges to / 
and {x n , n > 1} is such that x n E Argmin /„, then if x n — > x (in the metric space (V, d)), 
x G Argmin /. In order to make use of this result in our case, we need to impose additional 
conditions that ensure that 9 n converges and that the limiting function k(9±, •) admits a 
unique minimum. 

For s G S and 9 G M, define the infinite matrix p[ s ^ = {p[ s \£,£'), £,£' G S}, where 



') d = E* [Cov e (B(X s ,X e ),B(X a ,X e ,)\X dga )] , £,£' € S. 



Corollary 1.3. Suppose that the assumptions of Theorem \1.2\ hold and also that for any 
9 G M, any s G S , is a positive definite matrix. Let {9 n , n > 1} be a Borel measurable 
sequence of Aii such that 9 n G ArgmaxQ n . If {(9 n — 9^), n > 1} is uniformly tight, as 



a random sequence of A4\, then \\9 n — 9 
Proof. See Section [2.2,31 



Mi 



converges in probability to zero. 



□ 



The tightness condition is needed but in general is difficult to check. Intuitively, the 
tightness of {(9 n — 9^), n > 1} implies that the overall dependence between the missing 
nodes and the observed nodes is limited. We will not attempt to make this statement 
precise. We will rather study more precisely the connection between the missing nodes 
and the rate of convergence of 9 n . We assume that the following holds (see Section [1.2.11 
for a discussion). 

A3 Assume that there exist a n , a' > such that for all 9,9' G A4^ n \ 



s£D n 



Vara, [ ]T 9{s,£)B(X s ,X,)\X dn 

I 



JeD„ 



> On I 



12: 



and E 



1/2 



£lo g 

seD n 



1 /S) 



(l)|y(l) 



V 



A*) /y(l)|y(l) 
KM [ Xs 




<oL\\eh, 



for all n large enough. 

Let {a n , n > 1} be a sequence of positive numbers. We define M^ n \a n ) the set of all 
finite p n x p n symmetric matrix 9 such that 

\{(s,£)£D_l: \9(s,£)\ >0}| <a n . 

In other words, Ai^ n \a n ) is the set of elements of with sparsity a n . We introduce 

^(c) dcf | g ^ J3 TO : 9s \ D n ^ 0}, the set of observed nodes that admit neighbors outside 
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D n . We can think of as the boundary of D n . Let {r n , n > 1} be another sequence 
of positive numbers. We define J^A^(a n , r n ) as the set of all 9 G M ( - n \a n ) such that 

E (e w^i) 1 E 



We relate the behavior of the estimator 9 n , to the class of functions T n $ = f {m n g, 9 G 
fi rei5 }, where £ ni<5 = f {9 G 7W( n )(a n ,r n ) : ||0|| 2 < 5}, 5 > 0, and 



m nj0 (x) = E lo S I 
se»„ V 



It is clear that the size of the family T n g depends on the size of B n ^. By the sparsity of 
M.( n \a n ,T n ), and for a n < d n /2 (we recall that d n = p n (Pn + l)/2, where p n = \D n \), we 
have 

N(e,B n , s ,\\.\\ 2 )< f^) a \ (11) 

for some universal constant c, where N(e,B nj s, \\ ■ || 2 ) denotes the e-covering number of 
the set B n s with respect to the £ 2 -norm on M n {a n ,T n ). To see this, notice that the e- 
covering number of the £ 2 -ball of M a " with radius 5 is bounded from above by (35/e) an . 
For 9 G MW(a n ,T n ), since the number of non-zeros entries of 6 is bounded from above 
by a n , there are at most ( ) ways of forming 9 from a sequence of a n non-zeros elements 
of Thus N(e,B n>s , \\ ■ || 2 ) < (^)(35/e) a ". By Stirling's formula, 

J < J— exp(-a n log(a„/d n ) - (d n - a n )log(l - a n /d n )) 

< exp (a n (log(d„) - log(a„) + c)) , 
for some finite constant c, which leads to (HID . See also Vershynin J2OO9I ) . Finally we 




introduce 

= ] E ( E im 

which measure the strength of the dependence between the missing nodes and the observed 
nodes. Our main result is as follows. 

Theorem 1.4. Assume {3)), 43 Suppose that as n — >■ 00 , a n ,-v/log p n = 0(a^n 1//2 ), and 
A n = 0{-\/n \ogp n ), and also that n — 9^) G Ai n (a n ,T n ) for all n large enough. Then 
r n \\8n ~ #i n) ||2 = O p (l) as n -)• 00, w/iere r n = a n1 /ra/ (Va« logp„ + ^b n T n ) . 

Proof. See Section [2T3T 

□ 
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The theorem implies that 9 n is consistent in estimating 9^ if T n b n = o(a n ). In the 
above result, we need to find a n and r n that guarantee that (0 n — 9^) G A4 n (a n ,T n ) for 
all n large enough. Notice that for any 9 £ we have trivially 



2x 1/2 



E E 



< 2 sup |{^ G £> n : |0M)| > 0}| 1/2 ||0|| 2 . 



{ S gaL c) } 



This means that any 9 G A^ n )(a n ) also belongs to .M( n )(a n , r n ), for r n = 2n n , where 
n n = f sup{ 6 , g _ M („)( an - ) | sup sgA („) |{£ G D n : |0(s,^)| > 0}|. Therefore with this choice of 
T n , we can replace M^ n \a n ,T n ) by Af'"'(o n ) in the above theorem. This leads to the 
following reformulation. 

Corollary 1.5. Assume ^0 Suppose that as n — >■ oo, a n \/log p n = (^(a^n 1 / 2 ), an<f 
A n = 0(y/n log p n ), and also that (9 n — 9* ) G Ai n (a n ) for all n large enough. Then 
r n \\0 n ~ O^h = O p (l) as n ->■ oo, where r n = a n ^/ \^a n logp n + ^b n nj 2 



If the penalty function is q\(x) = Xx, the extensive recent literature on lasso points 
to the fact t hat the estimator 9„ is sparse and recovers the spa r sity structure of t he true 



network 9* (jBaneriee et all (|2008l ) ; iMeinshausen and Yul (|2009l ) ; 



Guoetal 



(|20ld )). This 



(n) 

suggest that a n can be taken proportional to the sparsity of 9\ . That is 



a n ^\{(s,£) eRl : \9*(s,£)\ > 0}| . 
Finally, we will point out that if b n = 0, then there is no missing data prob lem. In that 



case, Theorem II .41 yields a similar rate of convergence as in 



Guo et al 



(12O10h . 



1.2.1. Comment on For s G D n , 9 G M^ n \ consider the following matrices r 



Pn.e 



{PnliW'), W e A»} and p n>e = {p n ,e(s,£;s',e>), (s,£), G D n x D n }, where 



del 



(id 



E*[Cov e (B(X s ,X e ),B(X s ,X e ,\X dnS )}, and p n , e (s, a', , 



del 



E* [(JB(X S ,X^) — Eg (£(X s ,X,)|X 9nS )) (B(X a ,,X € ,) - E e (B(X S ,, X f )|X 5tiS ,))] • 

It can be easily seen that if the smallest eigenvalue of p n g is bounded from below by 
a n > 0, uniformly in s and 9, then the first part of A[3] holds. Similarly, if the largest 
eigenvalue of p^\ is bounded from above by a' n < oo, uniformly in 9, then the second part 
of AO holds. 

In many practical examples, B(x,y) = Bo(x)Bo(y), where Bq is a nonnegative and 
bounded function. In that case, one can often check A0 Indeed, the matrix p^ g becomes 
pfy{l,l') = E* (B s/ B s/ ,Var e (B(X s )\X dnS )) , where B s/ = 1 is I = s and B s/ = B (X e ) 
otherwise. If there exists c n > such that Varg (B(X s )\Xg nS ) > c n for all s G .D n and all 
9 G A/f (n) , then the first part of A[3] holds with 

c^n — c n a n o, where a n n > is the smallest 
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of the eigenvalues of the matrices E* [B s ^B s ^i\ . Similarly, 

p n ,e(s,t; s',£') = E* [B sA B s >i< (B(X B ) - E e (B(X s )\X 9nS )) (B(X S ,) - E e (B(X s ,)\X dnS ,))] 

< CE* (B S! eB s i£>) . 

Then the second part of A[3]holds and we can take a' n proportional to the largest eigenvalue 
of E* (B s/ B s , ei ). 

Example 1 (The auto-binomial and auto-logistic models). We consider here the particular 
case of the auto-binomial models which is an extension of the popular auto-logistic model. 
The auto-binomial model allows to model data where the available observation at each 
node can be seen as a number of successes over a given common number of trials. Fix 
k > 1 the number of trials and set X = {0, . . . ,k}. The interaction functions of the auto- 
binomial model are given by A(u) = Bq(u) = u and B(u,v) = uv. The particular 
case k = 1 corresponds to the auto-logistic model. The modeling assumption here is that 
for any s £ S, 

X s \X dgS = x 9eS ~ B(K,af\xa 9 a)), where log 6 (s) 9eS — \=9(s,s)+ ^ Q(s,£)x£. 

\l-a {x dgS )J te«s\{ s } 

In the above display, B(n,p) denotes the binomial distribution with parameters n,p. Now 
for 9 e M^ n \ Var e (B(X s )\X 9nS ) = ko$ (l - a$) , where is given by = 

(l + exp(-6(s,s)-^ sj=1 9(s,j)X j ^ . If we insist that snp s ^ D J9{s,£)\ < K, 
and that 9 G .A/f( n) (a n ), then 

Var e (B(X s )\X 9nS ) > 4r 1 Ke- 2Kt ^ /a , s E D n , 9 G M^(a n ), 

def 

where N n = snp 6eM ( n )^ an ^snp seDn \{£ G D n : |0(s,£)| > 0}, is the maximum degree 

1/2 

in M { - n \a n ). It follows that A[3] holds with a n = a n>0 4 _1 Ke~ 2KNn , where q„ i0 > 
is the smallest of the eigenvalues of the matrices E* {B S ^B S! ^; and a' n can be take as 
proportional to the largest eigenvalue of E* {B s ^B s i^. 

1.3. Monte Carlo Evidence. We consider the auto-logistic model where X = {0,1}, 
A(x) = 0, Bq(x) = x, and B(x,y) = xy. We work with the i 1 penalty: q\(x) = Xx. With 
respect to the number of nodes, we consider two cases: p = 50 and p = 80. For each setting, 
we consider different values of n (the sample size) through the formula n = alogp/ f3 2 , 
where a is the number of non-zero elements of the true network structure that we choose 
to be approximately 1.3 *p, and where (3 is chosen in the range [0.3, 2.0] (for p = 50), and 
[0.6,2.0] (for p = 80). 

We compare three settings. In Setting 1, there is no missing data, and the samples 
are generated exactly from (UJ, for 9 = 9* (we set up 9* such that 0*(s,-£) > and 
we use Propp- Wilson's perfect sampler). In Setting 2 and 3, we generate the sample 
{xf\...,X$\xf +l ,...,xf +r ) from ©, for 9 = 0*, and we retain only (xf \ . . . , X®), 
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for 1 < i < n. Thus there are r missing nodes. In Setting 2, we use r = 8, whereas in 
Setting 3, we set r = 20. Table 1 shows the corresponding values of b n in each setting. 





Setting 1, r = Setting 2, r = 8 


Setting 3, r = 20 


p = 50 


1.8 


4.41 


p = 80 


1.8 


3.6 



Table 1. Values of b n in each setting of the simulation. 



Regardless of the data generation mechanism, we fit model ([I]) by I 1 penalized pseudo- 
likelihood and compute the relative Mean Square Error E* (\\9 — #*||2^ /||#*||2) estimated 

from K replications of the estimator (K = 50). In Figure 1, we plot E* (^\\9 — #*||2^ /||#*||2 
as a function of f3. As expected, the more missing data, the worst the estimator behaves. 
Notice that in Setting 2 (where r = 8), the loss of accuracy of the estimator is worst for 
p = 50 compared to p = 80, although we have the same value b n = 1.8. This points to the 
fact that in the rate of convergence of 9, the factor b n is modulated by a factor related to 
size of the problem as predicted in Theorem 11.41 (the term r n ). 




02 0.4 0.6 




Figure 1: Relative MSE versus /3, where star-line is Setting 1, square-line is Setting 2, triangle-line is 

Setting 3. (a) p = 50, (b) p = 80. 



2. Proofs 



(| 19881 ) 



2.1. Some basic facts on infinite volume auto-models. We recall from lGeorgii 
some basic facts on Gibbs distributions. Let (X, £, p) and S as in Section fl. II Let (f2, J-) = 
(X s , £ s ) be the product space equipped with the product Borel sigma-algebra. We will 
need few more notations. We denote by X s the projection maps, that is, X s : (O, J 7 ) — >• 
(X, £) such that X s (uj) = lo s . For A C X and A C S, we denote by A A the product 
set {K) seA , u s £ A}. We define X A : (U, J") -> (X A ,£ A ) as X A (u) = {uj s , s e A} 
and we denote J- a the sub cr-algebra of J- generated by the map Xjj, U C A, U finite. 
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For two disjoint subsets A, A of S, if u = {xj, i G A} and v = {x^, i G A} we write 
uv = {xi, i G A U A} for the the concatenation of u, v. 

For A C S finite, we define the kernel pa from (f2, J^va) to (S7, ^P) as follows 

p A (a;, A) ^ (p A x ^ SXA ) (A) = p A ({u G X A : uw s \ A € A}) , u> G 0, A G T. 

In the above, 5 X is the Dirac mass at x and p A denotes the product measure ® s( za P on 
(X A ,£ A ). This kernel is best understood through its operation on bounded functions. If 
/ : Q — > R is a bounded measurable function and oj G Cl, we have 

Pa/H = f J PA(w,dz)f(z) = J ^ f(uu s \A)p A (du). 

For an infinite matrix 9 : 5 x 5 — > R and A a finite subset of 5, we define ng a a 
probability kernel from (Cl, Ts\a) to (0, F) by 

7T0,a(w, dz) = 7 exp {ify A (z)} p A (w, dz), 

where for z£H, 

sga y i>s,i^s 

The term Zg(u)) = paHq^lo) is the normalizing constant. We write 7rg = {^.a, A C 
S, A finite} assuming that each kernel ~Ko,h is we U defined. If p is a probability measure 
on (CI, J 7 ), h : !l->Ma p-integrable function and Q a sub-sigma-algebra of J- , we denote 
by //(/i|£7) the conditional expectation of h given Q. An infinite volume auto-model is a 
probability measure pg on (Cl, J 7 ) that is consistent with the family "Kg a in the sense that 

Pe (/|-7\s\a) (•) = J ^0,A(-,dz)f(z), ng -a.s., (12) 

for any finite subset A of S and any bounded measurable function / : (Cl,J-) — > R. 
Notice that (|12p implies that pgng^A = Pe, that is, each probability kernel in the family 
irg is invariant with respect to pg. The probability measure pg is an example of a Gibbs 
measure. We call a random variable X = {X s , s G 5} with distribution pg an auto-model 
random field with distribution pg or with conditional specification irg . It is well known that 
given a conditional specification irg, a consistent Gibbs measure does not always exist and 
when it does, it is not necessarily uni que. In t he pr esent case, infinite- volume auto-models 



exist. This follows for example from iGeorgiil ()1988l ). Theorem 4.23 (a 



Proposition 2.1. Suppose that |3j) holds and let 9 : S x S — > R be an infinite matrix 
such that deg(s, 9) < oo for all s G S. Then the set of probability measure pg that satisfies 
(G2J) is nonempty. 



2.2. Consistency: proof of Theorem 11.21 The theorem consists in showing that for 
almost all sample paths, the function U n epi - conve rges to k(9 ir ; •). It is obtained through 



a slight modification of Theorem 5.1 of iHesd (119961 ). 
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2.2.1. Preliminaries. Let (V, d) be a Polish space with metric d and Borel sigma-algebra 
£>(V). Let {g,f n , re > 1} be a sequence of real-valued functions defined on V. The next 
proposition states that if f n can be written as f n = g n + r n , where r n converges to zero in 
an appropriate sense, then if g n epi-conver ge to g, so does f n . The proof is simple and is 
omitted. It also follows as a special case of lDal Masol (|1993l ) Proposition 6.20. 



Lemma 2.2. Suppose that f n = g n + r n , where {g,r n ,g n , n > 1} are real-valued functions 
defined on V, such that g n epi-converges to g. Suppose that \r n {u)\ < c(l + d(u,0))a n for 
all u G V and for some finite constant c, where a n — > 0. Then f n epi-converges to g. 

For a real-valued function / on V, k integer, we define its Lipschitz approximation of or- 
der k as f( k \u) = f mi v ^y{f(v)+kd(u, v)}. The Lipschitz approximation /W is a Lipschitz 
function (with Lipschitz coefficient k). For any u G V the sequence {f^ k \u), k > 1} is non- 
decreasi ng, upper bounde d by f(u) and if / itself is Lipschitz on V, sup fc>1 f^ k \u) = f(u) 
(see e.g. bal Masol (jl993T ) Theorem 9.13). 

Let (E, £) be a measurable space. A more useful sigma-algebra to work with is £, the 
sigma-algebra of universally measurable subsets of E with respect to £. £ = C\^£^ where 
£ il is the /U-completion of £ with respect to a u-finite measure [i on (E,£) and where the 
intersection is over all a-finite measures on (E,£). If g : i? x V -> R is a function, x £ E 
and k > 1, we den ote g^ k \x,-) the Lipschitz approximations of order k of g(x, •). It is 
known (|Hessl (|1996l ) Proposition 4.4) that if g is £ x 5(V)-measurable, t hen f or any k > 1, 



Hessl (|1993 ) Proposition 



g( k ^ is £ x S(V)-measurable. The following result is taken from 
3.4. 

Proposition 2.3. Let {g n , n > 1} be a sequence of £ x £>(V) -measurable real-valued 
functions satisfying the following assumptions. There exist a finite constant c G (0, oo), 
uq G V, such that g n (x, Uq) = for all n > 1, and 



sup sup \g n (x, u) — g n (x, v)\ < cd(u, v) u, v G V, x G E. 

n>l x£E 



(13) 



For x G E, let li e g n (x, •) and ls e g n (x, •) be the epi-limit inferior and superior of the sequence 
{g n (x, •)) n > 1} respectively. Then for all u G V, 

li e g n (x,u) = supliminf g!£'(x, u), and ls e g n (x,u) = suplimsup g!£'(x, u). 

Proposition 2.4. Let {g n , n > 1} be as in Proposition \2.3l and let {X^, k > 1} be a 
sequence of E -valued random variables defined on some probability space (0,^4, P). Define 

d l 1 n 

h n (u,u) = -} g n (X k (u)),u), n > 1, u G ft, u G V. 



Suppose that there exists a Lipschitz function <f) : V — > R and N C O, P(iV) = suc/i </iai 
for all oo £ N , 

lim h n (uj,u) = 4>(u), for all u G V. (14) 
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Then for all uj ^ N , ls e h n (ui,u) < <f>{u), for all u G V, where ls e h n (uj,-) is the epi-limit 
superior of the function h n (u,-). 



Proof. This result is part of iHessI (|1996l ) Theorem 5.1. We give the proof here for com- 
pleteness. Fix uj N and u G V. The Lipschitz property (|13|) of g n transfers to h n and 
by Proposition 12.31 \s e h n (uj, u) = sup fc>1 limsupn^oo hn (uj, u). For any k > 1 there exists 
a sequence {v p , p > 1}, v p = v p (u,k) G V such that ^ k \u) = mf p >i{(ft(v p ) + kd(u,v p )}. 
Then 

limsup h^\uj, u) = limsup inf {h n {uj, v) + kd(u, v)} < inf limsup{h n (uj, v) + kd(u, v)} 

n— >oo n— >oo «GV «€V n— 5.00 

< inf limsup{fo n (u;, v p ) + kd(u,v p )} = mi{4>(v p ) + kd(u,v p )} = 4>^ k '(u). 

P>1 n->oo P>1 

Taking the supremum over k on both side gives the result. □ 
We now consider the case where {X k , k > 1} is an i.i.d. sequence. 

Proposition 2.5. Let {g n , n > 1} be as in Proposition \2.3l Suppose that there exists a 
real-valued, E x B(\l) -measurable function g such that 

sup \g(x, u) — g(x, v)\ < cd(u, v) u, «£V, (15) 

where c can be taken as in [T3\) , and for any (x, u) G E x V 

lim \g n (x,u) - g(x,u)\ = 0. (16) 

Let {Xfc, k > 1} 6e a sequence of E -valued, i.i.d. random variables define on some prob- 
ability space (Q,A,P). We define, 4>(u) = f K(g(Xi,u)). Then there exists a ^-negligible 
subset N of £1 such that for any u G V and uj G Q \ N , 

h n (uj,u) epi- converges to <j)(u), as n — > oo 

where h n (u, u) d = n^ 1 Y2=i 9n(X k (uj),u). 

Proof. Notice that we can assume without any loss of generality that the constant c in 
(|13p and (|15p is smaller than 1. Otherwise simply divide g and g n by 2c, say. It follows 
from ((T5D that 

sup sup |<?n(£, u)\ < cd(u, Uq). (17) 

n>l x£_B 

This implies that <? n is bounded in a; and that (j) n ( u ) = E (g n (X±, u)) is well-defined 
and is uniformly bounded in n for each u. Now, since g n (x,u) converges pointwise to 
g(x, u), we can then apply the Lebesgue dominated convergence theorem to conclude that 
f^niu) (l>( u ) for each u G V. (|15|) implies also that (p is Lipschitz. 

Furthermore, by the law of large numbers for arrays of independent random variables, 
for each u G V, there exists a measurable set N±(u) C 0, ¥(Ni(u)) = such that for all 
uj £ Ni(u), ^Y2=i(9n(X k (u)),u) - 4> n (u)) converges to zero. Since 4> n (u) -> <f>(u) and 
using (|13p and the Polish assumption, we conclude that there exists N\ C 0, P(iVi) = 
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such that for all ui ^ N\, lim^-^ h n {w,u) = 4>(u), for all u G V. By Proposition 12.41 we 
obtain for all uj ^ N\, \s e h n (ui,u) < 4>(u), for all u G V. 

We will now show that there exists N2 C CI, F(A^2) = such that for all u £ N2, 

liminf h&\u,u) > E (g^fX^uj) , for all u G V, k > 1. (18) 

By Proposition 12.31 we can then deduce that for all uj G CI \ N2, and for all u G V, 
li e /t n (u;,n) > sup fc>1 E (</W (Xi, u)) = lim^ooE (g( k \X\,u)) = (j>(u), by dominated con- 
vergence. And the result will be proved. 

Let us show that (|18p holds. Fix n G V, k > 1 integer. Notice that gn (x,u) < g n {x,u) 

(k) 

and given the boundedness of we apply the law of large numbers to gn to conclude 
that there exists N 2 C $7, P(iV 2 ) = such that for all a; ^ 2V 2 , 

1 n 

lim inf /i^) (w, u) > lim inf - V g( fc ) (X (w) , u) 

n— >oo n— >oo 77, ^— ' 

i=l 

1 n 

= lim - Y,Un\Mu),u) - EU*\X u u))) + liminf E 

n— >oo n z — ' V V / / n->oo V / 

i=l 

= liminf Ef^ fc )(Xi,«)) . (19) 

We obtain as a consequence of (fT3|) that \g n (x, u)\ < cd(u, Uq), for all n, x, u. Consequently, 
for any v G V, and k > 1, g n (x,v) + fed (it, u) > — cd(v,«o) + kd{u,v) > — cd(u, no). 
This shows that there exists a finite constant C(n) (for example cd(n, no)) such that 
g n k \x,u) + C{u) > for all x £ E. By Fatou's lemma, we deduce that 

lim inf E ( g n k) (X u uj) > E f lim inf g n k) (X 1 ,uj) . (20) 

Fix x G E. Given e > 0, we can find i>o = Uo(x, n, rt, e) G V such that g n k \x,u) > 
g n {x,vo) + kd(u, vq) — e. Because of (fT3|h vq G B(u,e/(1 — c)), where B(x,r) is the 
ball of center x and radius r. Indeed, if d(it,n) > e/(l — c), then g n (x,v) + A;d(it,n) > 
g n (x,u) + (k - c)d(u,v) > g n (x,u) + e > g n k \x,u) + e. Thus 

9n\x,u) > g n {x,v ) + kd(u,v ) - e = g(x,v ) + /cd(n,u ) + (fffou) - g(x,v )) 
+ (5n(^, «) - g(x, u)) + (flr n (x, v ) - g n (x, u)) - e 

> g {k) {x,u) + (g n (x,u) - g{x,u)) - (1 - c) _1 e. 

Taking the liminf as n — > oo on both side and letting e — >• together with (|16p gives 
liminf^ooa^^u) > g^ix^u). Combining that with CED and (J2DJ) yields (JUJ). □ 



2.2.2. Proo/ o/ TheoremUIB Write C/ n (6>) = I7 n (e) + r n (0), where 

r n (0)=n" 1 ^ ( qXn (\e ir (s,e) + 9(s,£)\)- qXn (\e*(s,e)\)) 
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\r n (0)\ = n~ l \ r , 



E (^(I^M) + 0M)l) - 9a„(|0*M)|)) 



( S ,€)e4 n) (fl) 



< E 



,(s,£) + 0(s, 



t (a,*)|) < cn- 1 A„||0||i, 



for some finite constant c. Thus Lemma 12.21 applies and it is enough to show that almost 
surely, U n epi-converges to k(9±; •). 

To do so, we apply Proposition 12.51 Take E = X s with generic element x = {x(s), s S 
S} and V = Mi and 



l { x ,e) = - lQ g(- 



f$J ( x s\ x d n s) , 



The limiting function g is given by 



g{x,o) = E -lo s ( 

sG<S V 



/£+g(s B |ss\{ B }) 

/^(^I^Us}) , 



We have seen earlier that as a consequence of (Equation [29]) . |g(x,6*)| < oo. It is clear 
that g n is a real- valued normal integrand, g n (x,0) = and it follows also from (|29[) and 
d3D that 

sup\g(x,6) -g(x,0')\ + sup sup \g n (x, 9) - g n (x, 0')\ < C\\9 - Q'\\i, 

x€E n>l x£E 

for some finite constant C. Thus (|13p and (|15p hold. It remains to show (|16l) . 

Consider x E X 5 and 9 G .Mi. Since < oo, for any e > 0, there exists a finite 
subset A e c5 such that Yl(uv)£A e 2 l^( M ' u )l < e - We have 



|$„M)-<7(x,0)| < E 

S6-D n 



log 



/^^(^sl^flns) 



+ log 



+ E 

ses\£>„ 



log 



7^ e (x s |x 5 \ {g} ) \ 

/^(^I^Us}) / 

/ fgl ] +e ( x s\ x s\{s}) 
\ 4l ( x s\x S \{s}) 



■ (21) 



We first deal with the second term on the right-hand side of (|2ip . Fix e > 0. Take n large 
enough such that A e C D n . Then using again (j29|) and (|3j), we have 



E 

seS\D n 



fe%i(xs\xs\{s}) 



log 



< c E Ei^)i^ Ce - 

ses\D n teS 



18 



YVES F. ATCHADE 



The first term is obtained from 



(log f^ +e (x s \x s \{ s }) -\ogf£\x s \x S \ {s} )^ - (log f^ +9 (x s \x dnS ) -logf£\x s \x dnS ) 

~ dt \ ^ e ( s ^) B sA u > x t)fe S J+te(. u \ x s\{s})p( du )\ 

- E e(s,e)B(x s ,x e )+ ! dt\ f E e{sJ)B s/ (u,x £ )f ( 6 l ) +te (u\x dnS )p(du) \ , 

where B Sj i(x,y) = Bq(x) if i = s and B s ^(x,y) = B(x,y) otherwise. The above equality 
follows from Lemma 12.61 We use this to conclude that there exists a finite constant C 
such that 



. fe%e( x s\xd n s) \ +log / f0 S J+e(x B \x S \ {s} ) \ 
\ fil\ x s\ x d n s) ) \ fel\ x s\ x s\{s}) ) 



+ E i*M)i / dt [ s >, 



t(u,x e ) H^s\{ s }) - fe S J +t e( u \ x d n s) ) p( du ) 



<C E \0(s, 

ees\D„ 



Taking the sum over s £ D n = A t U Z? n \ A e we get 



S6fln 



<^E E i*( a >')i 



fil\ x s\ x a n s) J V /if^l^sus}) 7 
+ E E \ 9 ( 8 ^\ [ dt [ B 8tt (u,x e ) (f { e % w (u\x s \ {s} ) - f$ +t0 (u\x dnS f) p(di 

^ Ce +EEl M)l / dt I ^(«»^)(/i;U(«k5\M)-/iitfH^-))p(* 

For each s, the inner sum in the last term converges to as n — > 00. Since A e is finite, we 
conclude that 



lim y 

n. — inn < J 



log 



, /i^O^Nfl^) / V fel\ x s\ x s\{s}) J 



c Ce. 



It follows that (|16p holds. Finally by conditioning on X^^j^}, we notice that 



En 



1q / 



log 



-717— r fi ( u \ x s\{s})du 



fel\ u \ x s\{s}) J 



The theorem is proved. 



□ 
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2.2.3. Proof of Corollary \1.3[ Let us first show that k{0^, •) admits a unique minimum at 
0. Since £;M (#*,•) is nonnegative, k(9*,9) = implies that k( s \0*,0) = for all s £ 5. 
We use Lemma 12.61 to write 



- log f(% e (Xs\X s \ {s} ) + log f£\x s \X s \ {s} ) = 

(B(X 8 ,X t )- f B s/ (u,X e )f { g l\u\X s \ {s} )di 
e&s ^ Jx 

+ [ J2e(s,e)B s/ (u,x e ) [ dt(fi% g (u\x SX{s} )-4l\u\x SX{s} ))du. 

Jx eGS Jo v / 

Taking the expectation on both side and using Lemma 12.61 again yields 

k {s \9*,0) 







[ tdt [ drE* 




lo Jo 





1 fl 
tdt dr 0(s,e)9(sJ')p 

Jg £,£'€S 



Since p@ is positive definite, ^^(9+, 9) = if and only if 9(s,£) = for all I G S. 

Now, let e > 0. By tightness, there exists a compact subset K of M.\ such that 
sup n>1 P* (0 n - 9{ n) ) i K) < e. Therefore 



„4 - 9 { : i) \\i > e) < e + P* ((4 - G K, ||0 n - oi n) \\i > ^ 

e + h (u m>n Ue m - 9[ m) ) G K, \\9 m - 9i m) h 



})■ 



We conclude that 



Urn P, (\\9 n - ^ n) ||! > e) < e + P, ({(0 n - ^ n) ) G K, - 0< n) ||! > e} 



i> e> i.o. . 



Corollary 7.20 of lDal Masol (j 19931 ) and Theorem 11.21 imply that the probability on the rhs 
is zero. This ends the proof. 

□ 



2.3. Rate of convergence: proof of Theorem 11.4 



adapted from Chapter 3.4 of Ivan der Vaart and Wellnerl (|l996l ) of the rate of conver 



The proof of the theorem is 



gence of M-estimators. Fix e > 0. Let C, cq < oo such that ||-Bo||t» + Halloo < C 
and sup A>0 sup a;>0 q' x (x) < c . Under the stated assumptions, (y. n r n ci n '\ n n 1 = 0(1), as 
n — > oo. Therefore, we can take M > 1 large enough so that for all n > 1, 



mcQa- l r n a l J 2 X n n~ l < 2 M , and 16 ^ 2~ j < e. 

j>M 

For j > 1, define G nj = {9 G M {n \a n ,T n ) : 2^~ l < r n \\9\\ 2 < 2 j }. Clearly we have, 



(22) 



k - 9<"'||2 > 2" 



}=U{< 



j(n) 



g e 



j>M 



}• 
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On the other hand, since U n admits a minimum at 9 n — o[ n \ almost surely, and U n (0) = 0, 
it follows that {9 n - e[ n) e & n>j } C {inf flee U n (9) < 0}. We conclude that 



P* (r n \\§. 
We recall that 

u n (e) = n- 1 (i n (e[ n) ) - i n (ei n) + 



-^|| 2 >2^)<£] 

j>M 



inf U n (9) < . 



(23) 



+ E (qx n ms,£) + 9(s,£)\)-q Xn (\e,(s,£)\)). 

Set £ n) (^) = (M) G : #M) ¥= 0}- For G Q nJ , and using the mean value theorem 
and ACQ 

n -1 E (9An(l^^) + »(«»^)l)-?A„(|f*(*^l)) 

X A„ E K\Qx n (\^J) + 9(s,£)\)-q Xn (\9*(sJ)\))<c n- 1 \ n E 



( a ,Oe4 n) (0) 



( s /)e4"V) 

< can-^oV^ieHa < con-XoVa^r- 1 . (24) 



Now, for e MS n \ n- 1 (£ n {9^) - £ n {9^ + 9)) = n' 1 Y?i=\ m n ,e{ x{j) ) = n ^ £2=1 ^(^ (l) )+ 
M n (9), where fh n g(x) = m nj g(x) — M n (9), with m n ^ as in ([TO]) , and 



Afn(»)= E E 0M) E * / * I B.t{u i X t )(f^{u\X dttB )-f$ ) {u\X dnK ) 



s€D„ l£D 
•1 rl 



[ tdt I dr E E 



Var eW +tT , 



E 0M)£(X s ,JQ)|X Sn4 



> 



n 



""III, 



using Lemma [231 and AO Notice that the first part of ([22]) implies that ^p2 2( J x V n 2 > 
co2 J r~ 1 ay 2 A n re _1 whenever j > M. Therefore, using ([24"]) . 



flee„ 

and ([23]) becomes 



inf {#„(*)} > e mf <| n- 1 E^(^ W ) f + ^2 2 « 



J-l) r -2 



IP* ( r n \\9 n 



i>2 M ) < 



E^ 

j>M 



i=l 



sup 



-1/2 



E™m(* w ) 



t=l 



> 



16 rl ) 



< 16 ^ 2 _ 2J 



E*(||G n ||j- nj .)+ sup v^|Efl*(m„, fl (X))-M n (e)| 
0ee„.,- 



(25) 



where G n is the empirical process associated to the family T n j = T n 2j r -i- for / G ^Vij) 
6»(/) = Zti (f(X (l) ) ~ Mf(X {1) ))- And ||G„||^. Hf sup^ . \G n (f)\- 
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Using the fact that E* (B(X s ,X e )\X s ^ a y) = f B Sj t(u, X t )ffJ (u\X s \ {s} )du, /z*-a.s., to- 
gether with Lemma 12.61 we have 



!Ee,(»V»PO)-Af„(f>)| 



s£D n leD 



E 



B a 4X.,X t ) ( /^(n|X as ) - f ( (i(«|X 9nS ) ) 



^ E E l M)l E <fe„r n ||0|| 2 <6 ri r n 2^- 1 , (26) 

(<0 \feD„ / Weas\D„ 



seA 



where A^ c) = {s G D n : ds\D n ^%}. Notice that 

\m n fi{x)\ < 2C||0||i < c/3 nj -, for all 6 £ 6 nj -, 

where /3 n j = a l r l 2 2 : >r~ 1 , for some finite constant c. Also for 9 € ©n.j, the second part of 
E0 yields 

Ei /2 (m* t g(X)) < a' n \\9\\ 2 < S nd , 

where 5 n j = a'^r' 1 for some finite constant c. By Lemma 3.4.2 of 
(|1996L 



van der Vaart and Wellner 



(||G n ||jr ) < C J[] (S n j,T n J,L 2 (fJ,g i< )) I 1 + J [] (^-Fnj^O^J) ] , 

for some finite constant c, where Jp {5 n j,J r nd ,L 2 {nQ^)j is the bracketing integral of the 
family J- n j defined as 

J{] {5n,j,Fn,j,L 2 (fi e S) = J yjl + logiV[] (e,F n)j ,L 2 (^))de. 

I I 1/2 

For any 0' £ ©n,j? p ni e(^) — ™n,<9'( x )| ^ c ll^ — ^'lli ^ 2ca n \\0 — Q'Wz-, for all x E X 
pis Lipschitz property of the family T nJ , Theorem 2.7.11 of 
dl99J) and (HH) imply that 



van der Vaart and Wellner 



/•Sn.jan 1 ^ 2 /4c , 

J D ((J nJ , Jkj, ^ 2 (^J) < ca]l 2 J sJl + logN (e, nj , || • || 2 )de 



< c5 n jJ a n log 



for some finite constant c. Under the assumption a n \J\og p n = 0(a' n y/n), we obtain 
that "-" 1/2 /3n,j5~j^[] (6 n j,J 7 n j,L 2 (fi e J) < cn~ x / 2 a n y/\ogp n /a! n = 0(1). As the result, 
E* (||G n ||jr n J < c5 n j\/a n logp n . Combined with (|26|) and (f25l) and the expression or r n , 
it follows that P* ^r n ||# n — #| n ^||i > 2 M ^j < ec for some universal constant c. Since e > 
is arbitrary, the theorem follows. 



□ 
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2.4. A comparison lemma. 

Lemma 2.6. Let (Y, A, v) be a measure space where v is a finite measure. Let g\ , g 2 , fi , fi ■ 

Y — > R be bounded measurable functions. For i G {1,2}, define Z{ = J e 9i ^v{dy). For 
t e [0,1], let g t (-) = tg 2 (-) + (1 - t) 9l (-) and Z t = ^e^)u{dy). Let f t : Y -> R be 
such that fo = f\ and fx = f 2 - Suppose that ^ft{y) exists for v -almost all y G Y and 
su Pte{o,i],yeY\Ttft(.y)\ < °o- Then 

J f 2 (y)e^Z^(dy) - J h(y)e^ Z~ l u(dy) = jf* dt jf (^ft(y)) e^Zf^dy) 

+ f 1 dtCov t (f t (X), (g 2 - gi )(X)) , (27) 
Jo 

where Covt(Ui(X),U 2 (X)) is the covariance between U\(X) and U2(X) assuming that 
X ~ e^Zf 1 . 

Proof. Under the stated assumptions, the function t — > J Y ft(y)e yt ^ Zj~ 1 u(dy) is differen- 
tiable under the integral sign and we have: 

f 2 {y)e^Z; 2 1 v{dy) - J h(y)e^ Z£v{dy) = J j f (J f t {y)e^ Z^ l v{dy)\ dt. 
The identity follows by carrying the differentiation under the integral sign. □ 
With the choice f t (y) = tf 2 (y) + (1 - we get 

f 2 (y)e^Z- 1 u(dy) - J h{y)e^ Z^v{dy) 

< ||/ 2 - /illoo + 2(||/i|U + ||/ 3 ||oo)||02 " 9l\U (28) 

We will also need the following particular case. For bounded measurable function h\,h 2 : 

Y -> R, we can take fi(y) = log / e hi ^v(du), i = 1,2, f t (y) = log / e th2 ^ +{ - l -^ h ^u(du), 
and gi = g 2 in the lemma and get: 



log J e h ^u(dy) - log | e^(dy) = jf'di 



1 [ e th 2 (u)+(l-t)hi(u) 



jY 



In particular, 



log / e h2{y) v{dy) -log / e hliy) v(dy) 



<||^2-^l||oo. (29) 
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